16/05/2022

Co-authors

Warren Currie (https:) Debora Laura

Plan

  • Epistemology: a brief and idiosyncratic overview
  • Mechanism and novel conditions
  • How can we use models more effectively
    • machine learning models: big data and mechanism (Maxent)
    • mathematical models: mining models for data collection directions? (Laubiemier)
    • statistical models: using mathematical models to understand alternative mechanisms (Quinte)

Epistemology: How do we gain knowledge

Gaining knowledge in science

  • reality
  • data: a biased subset of reality
  • opinion: what we believe about reality

\(\rightarrow\)Science: an attitude linking belief and data, whereby we do not, at least in principle, maintain beliefs that are not supported by data

Scientific Theory

  • We may refer to beliefs supported by data, or which at least do not always contradict data, as theories

  • We will like theories to have a few other properties such as:

    • logical consistency
    • coherence with other scientific theories

Where do models fit in?

  • Model: a representation of reality

  • A structure that:

    • embodies some of our beliefs about reality
      e.g., predators negatively impact prey populations \(\frac{dN}{dt}=f(N)-g(N,P)\)
    • mimics some aspect of data
      e.g., linear regression \(y_i=\beta_0+\beta_1x_i+\epsilon\)
    • combines these two components (e.g., makes a statement about the expected pattern of data in light of theory)
      predator consumption rate can be described as a type II functional response: \(\frac{g(N,P)}{P}=\frac{aN}{N+N_0}\)

Types of models

  • conceptual (e.g., a statement)

  • physical (e.g., lab experiment)

  • mathematical (e.g., ODE)

  • data-driven (e.g., regression)

  • computational (e.g., IBM)

“predators can positively impact prey”

Bell & Cuddington 2018

Bell & Cuddington 2018

\(\frac{dN}{dt}=f(N,E)+g(N,P,E)\)

\(E(y_i)=β_0+f(x_i)+\epsilon\)

Cuddington & Yodzis 1999

Cuddington & Yodzis 1999

Characteristics of models

  • trade off precision, generality and realism (Levins 1966).
  • is not possible to include all details of a system and still have a useful tool (e.g., a one-to-one scale map of a city may include all details but is useless as a guide to finding your hotel)

“We actually made a map of the country, on the scale of a mile to the mile!”
“Have you used it much?” I enquired.
“It has never been spread out, yet,” said Mein Herr,
“the farmers objected: they said it would cover the whole country, and shut out the sunlight! So we now use the country itself, as its own map, and I assure you it does nearly as well.

Lewis Carroll - The Complete Illustrated Works. Gramercy Books, New York (1982)

  • therefore, models are always false in some aspects of their representation theory or data

Complexity is not necessarily better

  • think about the map example: complexity is not necessarily helpful for explanation
  • complexity is not necessarily helpful for prediction either
  • complex models are prone overfitting

“With four parameters I can fit an elephant, and with five I can make him wiggle his trunk.”

John von Neumann

  • don’t be impressed impressed when a complex model fits a data set well. With enough parameters, you can fit any data set
Mayer et al. 2010

Mayer et al. 2010

Overfitting

  • but it fits right??
  • No not really Overfitting occurs when data-driven model tries to cover all the data points in the dataset
  • as a result, the model starts caching noise and inaccurate values present in the dataset, and all these factors reduce the efficiency and accuracy of the model
Lever et al. 2016

Lever et al. 2016

\(\rightarrow\)Complex models \(\neq\) accurate prediction

Simplicity is not necessarily better

  • Simple models \(\neq\) reality/truth
    (fish stocks environmental variation?)

\(\rightarrow\) Model \(\neq\) reality

Characteristics of models

\(\rightarrow\) Theory \(\neq\) Model \(\neq\) Reality

Why do we need models then?

  1. Explanation
  2. Prediction

while we attempt to make do without it, both of these functions require mechanism

All models can include phenomological or mechanistic components or both

Valle et al. (2009) found that alternate modelingassumptions in the forest stand simulation modelSYMFOR can account for 66–97%of the variancein predicted stand dynamics. While the authorswere able to complete this comparison ofdiffering models formulations for SYMFOR, theynote that it may be very difficult to do the samefor exceedingly complex models.

The role of mechanism in modelling

Data-driven models by themselves, are generally not mechanistic

  • I’m including here standard statistical models (both frequentist and Bayesian), as well as machine learning models
  • we might look for a relationship

Because these models arebased on causal mechanisms rather than correla-tion, our confidence in extrapolating beyondknown data is enhanced. Of course, there isalways uncertainty about how an ecologicalprocess will interact with novel global changeconditions.

However, underconditions of global change, models based on thepast behaviour of a system may not be suitablefor projection forward (Williams et al. 2007,Lawler et al. 2010).

??Models without mechanism provide no useful explanatory information

Predators negatively impact their prey

  • how?

  • fear dynamics?

  • benefitting competitors

  • when, always?

  • is that true for generalist predators and specialists?

  • what about predators that eat prey competitors?

  • what about predators that modify the environment?

Without mechanism, we can’t answer these questions, and that’s a problem, since if we get the answer wrong, we might take an action that has the opposite of the desired effect

Mathmatical models and mechanism

  • starting from \(\frac{dN}{dt}=f(N)-g(N,P)\) is no different than starting from \(y_i=\beta_0 + f(x_i)+\epsilon\) in terms of mechanism

  • mechanism requires an explanation or idea about the predators negatively impact net prey population growth rate (what is g(N, P)?)

-we can leave g(N,P) to be a mere description of phenomena, or we can examine natural data closely, devise experiments, or reason logically to develop ideas about mechanism (e.g., Holling REF)

Mathmatical models and mechanism

  • once the function is specificed, the model can also suggest expected behaviour for given conditions within the domain of application (this is a two-species model!, well it might work okay for agricultural fields), or to make guesses outside of this domain (true, but the interaction strength between these two species is really large compared to everything else)

  • one advantage tho, the model can include a variety of functional forms that pertain to different mechanisms… we have to stop forgetting this!!

i.e., Lotka-Volterra pred-prey model (\(\frac{dN}{dt}=rN-aNP\)), Rosenzweig-McArthur pred-prey model (\(\frac{dN}{dt}=rN(1-\frac{N}{K})-\frac{aNP}{N+N_0}\))

-yes, these model are false

Claim 1: Mathematical models need to focus on mechanism rather than classification

  • more generally, mathematical models often tend to appeal to classifications of species interactions, a platonic ideal if you will, which more or may not exist
    e.g., a “predator-prey” model

  • these models have dubious explanatory value outside of the examplar classification system,

  • “predator-prey” model supposes there is a class of predator-prey interactions that have general properities accross species, systems and time that are related to the outcome of the interaction (-/+)

  • as I will discuss, the net effect of pairwise species interactions is not fixed.

  • Nor indeed is the a fixed net effect of the interaction of a species with its environment or community etc,

\(\leftarrow\) this is the first reason I don’t think this is a helpful approaoch missing from machine learning approaches, their oversimplified assumptions and extremely specific nature prohibit the universal predictions achievable by machine learning.” Baker et al. (2018). Mechanistic models versus machine learning, a fight worth fighting for the biological community? Biology Letters, 14(5), 20170660. https://doi.org/10.1098/rsbl.2017.0660

Any attempt by machine learning technologies to predict individual patient outcomes from past observations using a patient database is potentially able to identify which of existing treatments is most adequate, but intrinsically unable to suggest new treatment protocols or to provide accurate predictions for new treatments. In the literature, this aspect is referred as the ‘inductive capability’ of the learning algorithms (from past data, one can identify patterns happening in the data). This is vastly different from the deductive capability of mechanistic models, in which the combination of logical (mechanistic) principles enables extrapolation to predictions about behaviours not present in the original data [4]. In short, mechanistic models can provide insights and understanding into the mechanistic functions of treatments, and these are necessary to overcome the limitations of machine learning predictions

I haven’t seen an example of “universal predictions achievable by machine learning” in ecology but I am certainly of the mind that both approaches are useful IN THEIR DOMAIN OF APPLICATION

Mecahnsitic mathematical models

  • simplified mathematical formulations of causal mechanisms
  • , and developing and/or using analytical tools to determine whether the range of possible input –output behaviours predicted by the model, and hence the causal hypotheses, are consistent with experimental observations.

Other have the opposite view

“While mechanistic models provide the causality

Example: Rusty crayfish and endangered Hine’s emerald

  • instead of classifying phenomena, we should to focus on incorporating mechanisms, which may generalize accross species, systems and time e.g., \(\frac{dN}{dt}=f(N,E)+g(N,P,E)\) where g(N,P,E) could be positive or negative

(e.g., invasive rusty crayfish eat endangered Hine’s emerald dragonfly larvae)

Models without mechanism cannot reliably extrapolate to novel conditions

Data-driven models usually lack mechanism

  • statistical and machine-learning models can only make predictions that relate to patterns within the data supplied

can we use data-driven models to develop mechanistic explanations?

Maxent algorithm

  • a machine learning method, which iteratively builds multiple models. It has two main components:
  1. Entropy: the model is calibrated to find the distribution that is most spread out, or closest to uniform throughout the study region.

  2. Constraints: the rules that constrain the predicted distribution. These rules are based on the values of the environmental variables (called features) of the locations where the species has been observed.

Maxent modelling for giant hogweed distribution

  • use experimental data to suggest candidate predictors: may require cold stratification, refer moist sites

  • intial Maxent model to find strong candidates and eliminate correlated predictors (normally we would leave these in and assume that the penalization would takec are of correlation)

Develop data-driven models, with an eye to mechanism

  • constraint Maxent functions to forms that mimic standard ecothermic relationships (again, these are data-based)
  • train on a global dataset, test in inside and outside of training data range

Data-driven models which suggest mechanism

Let’s use data-driven modelling to identify mechanism

  • giant hogweed: the beginnings of a mechanistic model:
    • requirements for cold seed stratification temperatures to break dormancy
    • with development delays above this range
  • does require constraints, previous experiment, and some logical connections

can we use mechanistic models to better understand data?

Regime shifts in bistable systems

  • a “sudden” change in state, e.g., Scheffer (2001) \(\frac{dx}{dt}=\frac{hx^\rho}{x^\rho+c}-b x+a\)
  • lake system moves from phytoplankton-dominated, eutrophic green water state to macrophyte-dominated, oligotrophic state `

Two ways to get a regime shift

  1. Erode stability of one eq’m, or
  2. Push system to second stable basin with a disturbance

## Wait… is that all the mechanistic model predicts? -asymptotic vs transient dynamics predicted by models

Bay of Quinte

  • history of being increasingly eutrophic
  • phosophorus controls inplemented 1978
  • invaded by zebra mussels in 1994
  • meostrophic following this
Bay of Quinte before and after mussel invasionBay of Quinte before and after mussel invasion

Bay of Quinte before and after mussel invasion

Standard explanation: Disturbance shift to new stable state

“In the mid-1990s, zebra and quagga mussels (Dreissena spp.) invaded the area, dramatically changing the water clarity because of the filter-feeding capacity.

Bay of Quinte remedial action plant (2017)

Long transients in a regime shift model

Alternative explanations for change in Bay of Quinte

  • all of which arise from the SAME mechanistic model

    1. a regime shift to a 2nd stable state cause by the disturbance of the zebra mussel invasions
    2. a long transient following the erosion of the the stability of the eutrophic state because of a lingering ghost attractor
    3. there was just a slow change in phosphorus (i.e. the system does not have bistable dynamics)

Examine alternatives using data-driven models:

  • Linear breakpoint analysis \(E(y_i)=β_0+break_i+x_i\)
  • Nonlinear analysis: Use generalized additive model (GAM: \(E(y_i)=β_0+f(x_i)\)), and examine the first derivative of fitted smooth to find periods of rapid change
Simulated data from Scheffer model (2001) where the high turbidity state, which is the initial condition, is no longer stable, showing the timeseries (points), fitted GAM model (black line) and 95% credible interval (green lines) for three different levels of additive noise (a-c), then taking the first derivative (d-f) of fitted GAM (black line), with simultaneous confidence intervals (green lines). Where the derivative significantly deviates from zero, we have a period of rapid change.

`

1. examine the dynamics of a mechanistic driver
(phosphorus)

2. examine the dynamics of the disturbance
(zebra mussels)

3. examine the dynamics of the response
(water clarity)

  • linear breakpoint model \(E(light_{i,s}) = \beta_s+break period_{i,s} + year_{i,s} + TP_i\)
  • response to phosphorus controls in the 70s at Belleville
  • maybe rapid change after mussels at Hay Bay?

`

3. examine the dynamics of the response
(water clarity)

  • nonlinear analysis : \(E(light_{i,s})=β_s+f(year_{i,s})+f(TP_i)\)
  • suggests rapid change at Belleville and Hay Bay
  • rapid change magnitude becomes pretty small when we control for concurvity in phosphorus impacts

`

Conclusion: Probably just slow change and a small disturbance

  • however, it could be the system DOES have bistable dynamics but either
  1. the parameter values are in a regime such that there is no sudden change
  2. the parameter values DO allow sudden change, but there is a long transient before that change
  • in either case, Zebra mussels only likely to contribute as a small scale disturbance

Lessons for using models from the Quinte project

  1. there are all kinds of dynamic behaviours predicted by even very simple mechanistic models (e.g., transients can be very long)
  2. it is going to be tough to determine mechanism in light of this variety of behaviour…but we NEED to because of management question
  3. support analysis with a variety of data-driven models at different temporal and spatial scales

Laubmimer

can we use physical models to better understand theory?

Prediction based on explanation: Mechanistic models

-often mathematical, but need not be so.

Case studies in connecting theory and data

Moving from theory to prediction?

Environmental stochasticity or varia- tion in parameter values might lead to amplification of disturbances [24] or differences in expected dynamics. For example, varying parameter values in a differential equation model can determine whether a monotonic or oscillating approach to a stable equilibrium is expected (Box 1). There- fore, uncertainty in parameter values will lead to uncertainty about which dynamic behaviors are most likely.

Open Science

Acknowledgements

We thank Luwen Chang and Matthew Zhou for their amazing learning curves, and subsequent help coding up the modules. The eCampusOntario grant also funded several students to evaluate an early version of the materials. Lina Aragon Baquero, Lauren Banks, Madison Brook, Jacob Burbank, Nicole Gauvreau and Aranksha Dilip Thakor provided valuable feedback.

The Git and GitHub module builds upon workshop materials that were originally developed with Chris Grandin (DFO), who AME also thanks for assistance with the module.

Funding

The Quantitative Biology in Life Science Graduate Programs workshop from which this work arose which was supported by funding from the Burroughs Wellcome Fund, from National Science Foundation Award DBI-1300426 for NIMBioS, with additional support from the University of Tennessee. The Workshop arose from a partnership between NIMBioS and the Southeast Center for Mathematics and Biology (SCMB).

Support for the development of online training materials comes from the Government of Ontario through a grant from eCampusOntario, and the support of Fisheries and Oceans Canada and the Faculty of Science, University of Waterloo (https://www.quantitative-biology.ca)

Other references

Bandura A. 1997. Self-self Efficacy: The Exercise of Control. Freeman, New York.
Charleston L, Leon R. 2016. Constructing self-efficacy in STEM graduate education. Journal for Multicultural Education 10: 152–166.
Chen Musgrove MM, Schussler EE. 2020. The Ph.D. panic: examining the relationships among teaching anxiety, teaching self-efficacy, and coping in biology graduate teaching assistants (GTAs). bioRxiv DOI 10.1101/2020.02.07.938597.
Eaton CD, Highlander HC. 2017. The case for biocalculus: design, retention, and student performance. CBE—Life Sciences Education 16: ar25.
Flanagan K, Einarson J. 2017. Gender, math confidence, and grit: relationships with quan- titative skills and performance in an undergraduate biology course. CBE—Life Sciences Education 16: ar47.
Johnston L, et al. 2019. A graduate student-led participatory live-coding quantitative methods course in R: experiences on initiating, developing, and teaching. Journal of Open Source Education 2: 49.
National Research Council et al. 2003. BIO2010: Transforming undergraduate education for future research biologists. National Academies Press.
Pajares F, Miller MD. 1994. Role of self-efficacy and self-concept beliefs in mathematical problem solving: a path analysis. Journal of Educational Psychology 86: 193.
Ward-Penny R, Johnston-Wilder S, Lee C. 2011. Exit interviews: undergraduates who leave mathematics behind. For the Learning of Mathematics 31: 21–26.
Williams JJ, et al. 2019. Barriers to integration of bioinformatics into undergraduate life sciences education: a national study of US life sciences faculty uncover significant barriers to integrating bioinformatics into undergraduate instruction. PloS one 14: e0224288.

What do you think?

Why

Second slide